ESTADISTICA PARA EL ANALISIS POLITICO II


Análisis Factorial I: Exploración


Hacemos analisis factorial para reducir las variables en otras variables resumen. Mientras la clusterización agrupaba filas, la factorización agrupa columnas. Pero, al igual que en clusterización, queremos saber si las nuevas variables tienen un nombre, al cual se le denomina técnicamente variable latente. En esta sesión exploraremos la data a ver qué emerge.

Preparación de Datos:

Para esta sesión trabajaremos con la data de estos links:

library(htmltab)

# links
happyL=c("https://en.wikipedia.org/wiki/World_Happiness_Report",
         '//*[@id="mw-content-text"]/div/table/tbody')
demoL=c("https://en.wikipedia.org/wiki/Democracy_Index", 
        '//*[@id="mw-content-text"]/div/table[2]/tbody')

# carga
happy = htmltab(doc = happyL[1],which  = happyL[2],encoding = "UTF-8")
demo  = htmltab(doc = demoL[1], which  = demoL[2], encoding = "UTF-8")

# limpieza

happy[,]=lapply(happy[,], trimws,whitespace = "[\\h\\v]") # no blanks
demo[,]=lapply(demo[,], trimws,whitespace = "[\\h\\v]") # no blanks

library(stringr) # nombres simples
names(happy)=str_split(names(happy)," ",simplify = T)[,1]
names(demo)=str_split(names(demo)," ",simplify = T)[,1]


## Formateo

# Eliminemos columnas que no usaremos esta vez:
happy$Overall=NULL
demo[,c(1,9,10)]=NULL

# También debemos tener nombres diferentes en los scores antes del merge:

names(happy)[names(happy)=="Score"]="ScoreHappy" 
names(demo)[names(demo)=="Score"]="ScoreDemo"


# Tipo de variables:

## En demo:
demo[,-c(1)]=lapply(demo[,-c(1)],as.numeric)

# En happy:
happy[,-c(1)]=lapply(happy[,-c(1)],as.numeric)

# sin perdidos:
happy=na.omit(happy)
demo=na.omit(demo)

Presta atención al merge. Usualmente hacemos merge por default y luego perdemos varias filas:

nrow(merge(happy,demo))
## [1] 145

Hagamos un nuevo merge, donde nos quedemos con TODOS los paises que no estab en uno u otro data frame:

HappyDemo=merge(happy,demo,all.x=T, all.y=T)

Esta vez HappyDemo tiene varios paises de más, pero con valores perdidos y nombres que no pudieron coincidir. Veamos:

# formateando a 
# HappyDemo[!complete.cases(HappyDemo),]

library(knitr)
library(kableExtra)
kable(HappyDemo[!complete.cases(HappyDemo),],type='html')%>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                  font_size = 10)
Country ScoreHappy GDP Social Healthy Freedom Generosity Perceptions ScoreDemo Elec­toral Functio­ning Poli­ticalpartici­pation Poli­ticalculture Civilliber­ties
14 Belize 5.956 0.807 1.101 0.474 0.593 0.183 0.089 NA NA NA NA NA NA
27 Cape Verde NA NA NA NA NA NA NA 7.88 9.17 7.86 6.67 6.88 8.82
33 Comoros NA NA NA NA NA NA NA 3.71 4.33 2.21 4.44 3.75 3.82
34 Congo (Brazzaville) 4.559 0.682 0.811 0.343 0.514 0.091 0.077 NA NA NA NA NA NA
35 Congo (Kinshasa) 4.245 0.069 1.136 0.204 0.312 0.197 0.052 NA NA NA NA NA NA
38 Cuba NA NA NA NA NA NA NA 3.00 1.08 3.57 3.33 4.38 2.65
41 Democratic Republic of the Congo NA NA NA NA NA NA NA 1.49 0.50 0.71 2.22 3.13 0.88
43 Djibouti NA NA NA NA NA NA NA 2.87 0.42 1.79 3.89 5.63 2.65
48 Equatorial Guinea NA NA NA NA NA NA NA 1.92 0.00 0.43 3.33 4.38 1.47
49 Eritrea NA NA NA NA NA NA NA 2.37 0.00 2.14 1.67 6.88 1.18
51 Eswatini NA NA NA NA NA NA NA 3.03 0.92 2.86 2.22 5.63 3.53
53 Fiji NA NA NA NA NA NA NA 5.85 6.58 5.36 6.11 5.63 5.59
57 Gambia NA NA NA NA NA NA NA 4.31 4.48 4.29 3.33 5.63 3.82
64 Guinea-Bissau NA NA NA NA NA NA NA 1.98 1.67 0.00 2.78 3.13 2.35
65 Guyana NA NA NA NA NA NA NA 6.67 9.17 5.71 6.11 5.00 7.35
84 Kosovo 5.662 0.855 1.230 0.578 0.448 0.274 0.023 NA NA NA NA NA NA
95 Macedonia 5.185 0.959 1.239 0.691 0.394 0.173 0.052 NA NA NA NA NA NA
117 North Korea NA NA NA NA NA NA NA 1.08 0.00 2.50 1.67 1.25 0.00
118 North Macedonia NA NA NA NA NA NA NA 5.87 6.50 5.36 6.67 3.75 7.06
119 Northern Cyprus 5.835 1.229 1.211 0.909 0.495 0.179 0.154 NA NA NA NA NA NA
121 Oman NA NA NA NA NA NA NA 3.04 0.00 3.93 2.78 4.38 4.12
123 Palestine NA NA NA NA NA NA NA 4.39 3.83 2.14 7.78 4.38 3.82
124 Palestinian Territories 4.743 0.642 1.217 0.602 0.266 0.086 0.076 NA NA NA NA NA NA
126 Papua New Guinea NA NA NA NA NA NA NA 6.03 6.92 6.07 3.89 5.63 7.65
133 Republic of the Congo NA NA NA NA NA NA NA 3.31 3.17 2.50 3.89 3.75 3.24
144 Somalia 4.982 0.000 0.712 0.115 0.674 0.238 0.282 NA NA NA NA NA NA
147 South Sudan 3.254 0.337 0.608 0.177 0.112 0.224 0.106 NA NA NA NA NA NA
151 Suriname NA NA NA NA NA NA NA 6.98 9.17 6.43 6.67 5.00 7.65
159 Timor-Leste NA NA NA NA NA NA NA 7.19 9.08 6.79 5.56 6.88 7.65
161 Trinidad & Tobago 6.192 1.223 1.492 0.564 0.575 0.171 0.019 NA NA NA NA NA NA
162 Trinidad and Tobago NA NA NA NA NA NA NA 7.16 9.58 7.14 6.11 5.63 7.35
168 United Arab Emirates NA NA NA NA NA NA NA 2.76 0.00 3.93 2.22 5.00 2.65

De lo anterior date cuenta que, por un lado, hay paises que les falta un bloque de indicadores, y que en muchos casos los nombres están mal escritos. Podemos recuperar algunos, pero en la data original:

# cambiemos a nombres usados por otra tabla:
## en demo por happy
demo[demo$Country=="Democratic Republic of the Congo",'Country']="Congo (Kinshasa)"
demo[demo$Country=="Republic of the Congo",'Country']="Congo (Brazzaville)"
demo[demo$Country=="Trinidad and Tobago",'Country']="Trinidad & Tobago"
demo[demo$Country=="North Macedonia",'Country']="Macedonia"

## en happy por demo
happy[happy$Country=="Palestinian Territories",'Country']="Palestine"

Luego de esos ajustes veamos:

HappyDemo=merge(happy,demo) # re creando HappyDemo
nrow(HappyDemo)
## [1] 150

En efecto se recuperaron 5 paises, asi quedará.

Evaluando data

El análisis factorial requiere que hagamos algunas observaciones previas.

  1. Calculemos matriz de correlación:
theData=HappyDemo[,-c(1,2,9)] # sin los Scores ni nombre de país.

# esta es:
library(polycor)
corMatrix=polycor::hetcor(theData)$correlations
  1. Explorar correlaciones:
library(ggcorrplot)

ggcorrplot(corMatrix)

ggcorrplot(corMatrix,
          p.mat = cor_pmat(corMatrix),
          insig = "blank")

Si puedes ver bloques correlacionados, hay esperanza de un buen analisis factorial.

  1. Verificar si datos permiten factorizar:
library(psych)
psych::KMO(corMatrix) 
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = corMatrix)
## Overall MSA =  0.87
## MSA for each item = 
##                      GDP                   Social                  Healthy 
##                     0.82                     0.92                     0.87 
##                  Freedom               Generosity              Perceptions 
##                     0.86                     0.62                     0.80 
##               Elec­toral             Functio­ning Poli­ticalpartici­pation 
##                     0.81                     0.92                     0.95 
##        Poli­ticalculture          Civilliber­ties 
##                     0.89                     0.84
  1. Verificar si la matriz de correlaciones es adecuada

Aqui hay dos pruebas:

cortest.bartlett(corMatrix,n=nrow(theData))$p.value>0.05
## [1] FALSE
library(matrixcalc)

is.singular.matrix(corMatrix)
## [1] FALSE
  1. Determinar en cuantos factores o variables latentes podríamos redimensionar la data:
fa.parallel(theData,fm = 'ML', fa = 'fa')

## Parallel analysis suggests that the number of factors =  3  and the number of components =  NA

Se sugieren 3, veamos:

  1. Redimensionar a numero menor de factores
library(GPArotation)
resfa <- fa(theData,nfactors = 3,cor = 'mixed',rotate = "varimax",fm="minres")
## 
## mixed.cor is deprecated, please use mixedCor.
print(resfa$loadings)
## 
## Loadings:
##                          MR1    MR3    MR2   
## GDP                       0.268  0.927  0.126
## Social                    0.321  0.663  0.152
## Healthy                   0.345  0.803  0.153
## Freedom                   0.255  0.284  0.516
## Generosity                              0.612
## Perceptions                      0.266  0.683
## Elec­toral                0.935  0.179       
## Functio­ning              0.753  0.429  0.356
## Poli­ticalpartici­pation  0.696  0.341  0.148
## Poli­ticalculture         0.520  0.304  0.507
## Civilliber­ties           0.919  0.317       
## 
##                  MR1   MR3   MR2
## SS loadings    3.404 2.629 1.582
## Proportion Var 0.309 0.239 0.144
## Cumulative Var 0.309 0.548 0.692
print(resfa$loadings,cutoff = 0.51)
## 
## Loadings:
##                          MR1    MR3    MR2   
## GDP                              0.927       
## Social                           0.663       
## Healthy                          0.803       
## Freedom                                 0.516
## Generosity                              0.612
## Perceptions                             0.683
## Elec­toral                0.935              
## Functio­ning              0.753              
## Poli­ticalpartici­pation  0.696              
## Poli­ticalculture         0.520              
## Civilliber­ties           0.919              
## 
##                  MR1   MR3   MR2
## SS loadings    3.404 2.629 1.582
## Proportion Var 0.309 0.239 0.144
## Cumulative Var 0.309 0.548 0.692

Cuando logramos que cada variable se vaya a un factor, tenemos una estructura simple.

fa.diagram(resfa)

  1. Evaluando Resultado obtenido:
resfa$crms
## [1] 0.03691667
resfa$RMSEA
##      RMSEA      lower      upper confidence 
## 0.07968360 0.04056531 0.10959050 0.90000000
resfa$TLI
## [1] 0.9614123
sort(resfa$communality)
##               Generosity                  Freedom              Perceptions 
##                0.3817131                0.4118847                0.5404301 
##                   Social        Poli­ticalculture Poli­ticalpartici­pation 
##                0.5661415                0.6193067                0.6232151 
##                  Healthy             Functio­ning               Elec­toral 
##                0.7878967                0.8784339                0.9071160 
##                      GDP          Civilliber­ties 
##                0.9470977                0.9517927
sort(resfa$complexity)
##               Generosity               Elec­toral                      GDP 
##                 1.039747                 1.073426                 1.206257 
##          Civilliber­ties              Perceptions                  Healthy 
##                 1.254248                 1.310446                 1.439138 
## Poli­ticalpartici­pation                   Social             Functio­ning 
##                 1.558243                 1.566452                 2.074167 
##                  Freedom        Poli­ticalculture 
##                 2.077796                 2.601503
  1. Posibles valores proyectados:

¿Qué nombres les darías?

as.data.frame(resfa$scores)
##             MR1          MR3          MR2
## 1   -0.44581478 -1.368481159 -1.098638556
## 2    0.48033887  0.090566181 -0.856703304
## 3   -1.00967021  0.585592494 -0.898629955
## 4   -1.01725350 -0.201269389 -0.687036544
## 5    0.73624548  0.463261056 -0.807836146
## 6    0.01584794 -0.013422423 -1.259526261
## 7    1.17998664  0.708063729  1.622400763
## 8    0.82303951  0.935375820  0.642938605
## 9   -1.50616013  0.826948485 -0.466857259
## 10  -2.01760261  1.621600785  0.247169356
## 11   0.21131821 -1.001237965  0.199771946
## 12  -1.66996548  0.917390607  0.039008363
## 13   0.79578389  0.823135593  0.752042977
## 14   0.45082762 -1.668920643  0.253673440
## 15  -0.05641176 -0.551594725  0.972541028
## 16   0.50412428 -0.397523850 -0.646284689
## 17   0.08318741  0.219130546 -1.327515737
## 18   1.06057976 -0.053434870 -0.211901993
## 19   0.92310546  0.142136942 -0.826976916
## 20   0.86850458  0.416442028 -1.244491103
## 21   0.06960676 -1.591645737  0.393753312
## 22  -1.01989359 -1.843388059 -0.236504917
## 23  -1.06874704 -0.634535152  1.193457361
## 24  -0.96478982 -0.850195652 -0.051683999
## 25   1.16412448  0.699895023  1.674473964
## 26  -0.82185506 -2.292555222 -0.715561120
## 27  -1.21459683 -1.193530366 -0.601933540
## 28   1.12549099  0.310091463  0.169110288
## 29  -2.02015893  0.930753211  0.730951312
## 30   0.92382717  0.031278951 -0.555349193
## 31  -0.99341039 -0.342515557 -0.440808189
## 32  -1.34392495 -1.731710198 -0.056671606
## 33   1.11687607  0.157292047  0.196185885
## 34   0.53162832  0.480393287 -0.921160352
## 35   0.85011409  0.764563718 -0.554786856
## 36   0.79624528  0.870222852 -0.688471459
## 37   0.94143175  0.714511451  2.121511960
## 38   0.54715397  0.139014157 -0.484753176
## 39   0.50703962 -0.003093982 -0.455381011
## 40  -1.06527183  0.235186366 -0.574312978
## 41   0.68842805 -0.254455499 -1.153543849
## 42   0.89257410  0.571923939  0.195893974
## 43  -1.11306931 -1.184726903  1.091758194
## 44   1.10985513  0.659666822  1.653651197
## 45   0.81540073  0.946585860 -0.223836118
## 46  -1.16044066  0.678685564 -0.957969601
## 47   0.07691544 -0.182138682 -0.664590958
## 48   1.00677308  0.796534944  1.072782633
## 49   0.57933687 -1.062947130  0.102809649
## 50   0.94020552  0.630579743 -1.527306759
## 51   0.39726736 -0.347429130 -0.255826679
## 52  -0.78707349 -1.305764808 -0.336231492
## 53   0.46456787 -1.864571231  0.146446067
## 54   0.55857179 -0.779649140 -0.409560785
## 55  -0.17346845  1.441868311  1.010220747
## 56   0.47287912  0.678256215 -1.060140370
## 57   1.10939209  0.852767876  1.372869501
## 58   0.88601738 -0.785000631  0.101840109
## 59   0.11795711 -0.206917896  0.927890206
## 60  -2.04168760  0.889625721  0.361186027
## 61  -0.95035625  0.548246073 -1.155387122
## 62   0.98365854  1.030709219  1.423727390
## 63   0.11396222  0.866304691  1.406808120
## 64   0.79327364  0.975072334 -0.956233635
## 65  -0.44161397 -1.077594626  0.006215688
## 66   1.04591371 -0.277244425 -0.267832175
## 67   0.76222476  1.001900825  0.153818849
## 68  -0.87467550  0.112047320  0.109949230
## 69  -1.65050509  1.197143904 -0.274185318
## 70  -0.33522854 -0.966870000  1.027485617
## 71  -1.49516709  1.803872673 -0.233903317
## 72   0.04293901 -0.757298467 -0.217984352
## 73  -1.78599653 -0.152032970  0.993701862
## 74   0.97925834  0.456397863 -0.826556978
## 75  -0.77804159  0.538112456 -0.078162481
## 76   0.94805064 -1.522705964 -0.195547331
## 77   0.63831210 -2.262436649 -0.239690918
## 78  -1.68409623  0.712425727 -0.388757551
## 79   1.04840427  0.693005613 -1.315279845
## 80   0.88865588  1.352624510  1.111733995
## 81   0.34657638  0.238754003 -0.733574453
## 82   0.23975441 -1.637992753 -0.252933264
## 83   0.50924510 -2.077682623  0.466171447
## 84   0.15039097  0.505948071  0.394348464
## 85   0.63440877 -1.617044211 -0.370645384
## 86   0.80740644  0.684542762  1.164305786
## 87  -0.34912495 -0.757095434 -0.595649455
## 88   1.11202349  0.179626701  0.506248330
## 89   0.27914706  0.459796315 -0.816176737
## 90   0.69514799 -0.572741827 -0.979962610
## 91   0.71917670 -0.125566004 -0.508518883
## 92   0.21186259  0.422573984 -0.813437625
## 93  -0.45635583 -0.087937715 -0.294497203
## 94  -0.48561718 -1.783217143  0.597131853
## 95  -1.14335379 -0.633287427  1.952906573
## 96   0.45215981 -0.099779520 -0.482248625
## 97   0.08717409 -1.151560004  0.852537682
## 98   0.92601217  0.806327162  1.601247115
## 99   1.26762633  0.471511002  2.002009591
## 100 -0.86360371 -0.173889833  0.193001205
## 101  0.06111228 -2.017396366 -0.389845399
## 102 -0.19638111 -0.770587015 -0.303260964
## 103  1.02148715  1.006708243  1.940850302
## 104 -0.03154804 -0.807150396 -0.230102465
## 105 -0.61081426 -0.232953321 -0.754153058
## 106  0.73062487  0.520443632 -0.681617305
## 107  0.73511658 -0.235024164 -0.469515639
## 108  0.70482067  0.052501923 -0.874432123
## 109  0.72297250 -0.435483094 -0.288170577
## 110  0.58322251  0.726880192 -0.870804685
## 111  0.99402547  0.733652385 -0.644937904
## 112 -2.01905922  2.325144966  0.630535581
## 113  0.60145626  0.544545448 -1.211233623
## 114 -1.42279012  1.248943430 -1.411789904
## 115 -0.90602330 -1.468904392  2.288749138
## 116 -2.34549497  1.837085556 -0.350236398
## 117  0.72638640 -1.398205292  0.210006801
## 118  0.63755954  0.175566272 -0.960254960
## 119  0.29100869 -2.006041810 -0.228120837
## 120 -0.34592659  1.610737584  1.856898063
## 121  0.77684080  0.735887414 -0.978620151
## 122  0.71862466  0.806359213 -0.351334507
## 123  0.74986983 -0.082771999 -0.229027439
## 124  0.81387127  0.840110174 -0.338026089
## 125  0.86650955  0.917892294 -0.285121513
## 126  0.22068499 -0.020427117  0.300412916
## 127 -1.62467476 -0.371226704 -0.280898781
## 128  0.96997873  0.743436655  2.178871590
## 129  0.83345428  0.976979664  1.818733107
## 130 -2.13333912 -0.226400585  0.188606775
## 131  0.97446367  1.032129797 -0.543468218
## 132 -1.78808895 -0.477977648  0.094151298
## 133  0.07830330 -1.310118791  0.599597342
## 134 -0.56502405  0.585696732  0.289474776
## 135 -0.72763414 -1.559224420 -0.121188819
## 136  0.55815746  0.651184758 -0.440521935
## 137  0.09861209  0.105761486 -0.500902501
## 138 -1.22517536  0.961002640 -0.129505220
## 139 -2.31866025  0.943419620 -0.265098142
## 140  0.27386398 -1.633603584  0.420051497
## 141  0.20270976 -0.139502857 -0.874216960
## 142  1.00880031  0.718322029  0.184231055
## 143  0.57524211  1.092845257  0.369168539
## 144  1.29100183  0.219300594  0.406160014
## 145 -2.06860237  0.048333426  1.589742380
## 146 -1.20739186  0.817518450 -1.210069407
## 147 -1.55297428  0.174837293  0.656505739
## 148 -1.75630380 -0.618763549 -0.421561741
## 149  0.34238324 -1.098811528  0.358161254
## 150 -1.01759077 -1.082303889  0.303390169
HappyDemoFA=cbind(HappyDemo[1],as.data.frame(resfa$scores))

library(plotly)


plot_ly(data=HappyDemoFA, x = ~MR1, y = ~MR2, z = ~MR3, text=~Country) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'Demo'),
                     yaxis = list(title = 'Tranquilidad'),
                     zaxis = list(title = 'Bienestar')))

RECORDANDO:

library(fpc)
library(cluster)
library(dbscan)

# YA NO NECESITAS CMD para HappyDemoFA[,c(2:4)]

g.dist.cmd = daisy(HappyDemoFA[,c(2:4)], metric = 'euclidean')
kNNdistplot(g.dist.cmd, k=3)

Para tener una idea de cada quien:

resDB=fpc::dbscan(g.dist.cmd, eps=0.6, MinPts=3,method = 'dist')
HappyDemoFA$clustDB=as.factor(resDB$cluster)
aggregate(cbind(MR1, MR2,MR3) # dependientes
          ~ clustDB, # nivel
          data = HappyDemoFA,    # data
          max)            # operacion
##   clustDB        MR1        MR2        MR3
## 1       0  0.2206850  2.2887491  2.3251450
## 2       1  1.2910018  2.1788716  1.3526245
## 3       2 -0.4563558 -0.2944972  1.2489434
## 4       3 -1.2251754  0.7309513  1.1971439
## 5       4 -0.5650241  0.2894748  0.5856967
## 6       5 -1.6246748  0.1886068 -0.2264006
plot_ly(data=HappyDemoFA, x = ~MR1, y = ~MR2, z = ~MR3, text=~Country, color = ~clustDB) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'Demo'),
                     yaxis = list(title = 'Tranquilidad'),
                     zaxis = list(title = 'Bienestar')))

Aqui acaba la Unidad I, el analisis factorial confirmatorio se verá en la siguiente Unidad.



al INICIO

VOLVER A CONTENIDOS